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Abstract 

Over the years, the most popularly used control chart for statistical process control 
has been Shewhart's X — S or X — R chart along with its multivariate generalizations. 
But, such control charts suffer from the lack of robustness. In this paper, we 
propose a modified and improved version of Shewhart chart, based on trimmed 
mean and winsorized variance that proves robust and more efficient. We have 
generalized this approach of ours with suitable modifications using depth functions 
for Multivariate control charts and EWMA charts as well. We have discussed the 
theoretical properties of our proposed statistics and have shown the efficiency of 
our methodology on univariate and multivariate simulated datasets. We have also 
compared our approach to the other popular alternatives to Shewhart Chart and 
established the efficacy of our methodology. 
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1 Introduction 

Shewhart (1931) introduced control charts in the mid 19th century as means for statistical 
process control [1]. Shewhart developed both the X — R and X — S control charts, but 
initially, owing to its ease of construction, X — R chart was more preferred. However, 
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nowadays, with high speed production processes, large subgroups can be easily obtained. 
Therefore, X — S charts are more relevant because standard deviation is a better measure 
of dispersion than the range. Later on, a multivariate analogue of Shcwhart chart was 
introduced on the basis of the Hotelling's T 2 statistic [3]. A major criticism of the 
Shcwhart chart and its multivariate analogue entails from the fact that both mean (X) 
and standard deviation (S) are non-robust measures of central tendency and dispersion. 
Another drawback of X — S type of control charts is its poor performance under small mean 
shifts. Various charts have been proposed using robust measures like runs test [7], repeated 
median filters [10] or univariate trimmed mean and range [16]. These methods though 
efficient, lack interpretability and are computationally difficult. Liu [1995] proposed a r 
chart based on the ranks of the data points for multivariate data [20] . She asserted that 
the ranks, when scaled to [0,1] will follow a uniform distribution and hence, the empirical 
distribution of the ranks would converge in law to U(0, 1). A problem with this method is 
that it neglects the true values of the observations and considers only relative percentiles. 
As a result, even if an observation undergoes moderate change, the change may not be 
reflected in the r chart. In this paper we have proposed control charts based on trimmed 
mean and winsorized standard deviation and discussed its distributional features. We have 
extended our notion to EWMA (Exponentially Weighted Moving Average) control charts. 
We have also suggested alternatives based on our approach, to Hotclling T 2 charts [3] and 
Multivariate EWMA charts using data depth. 

The distributions of the measures of central tendency and dispersion have been studied 
using re-sampling methods like Bootstrapping. We have compared the ARL (average run 
length) performance of our proposed chart with that of the usual mean and variance 
based control charts. The results of the study have established that in the presence 
of outliers, our proposed control chart clearly outshines the standardl charts, and has 
comparable performance with the latter in absence of outliers. Therefore the use of such 
charts is strongly recommended. All the relevant developments have been discussed in 
the subsequent sections. The programs have been written and evaluated in statistical 
softwares like MATLAB2009a and R2.11.1. 



2 Definition of trimmed mean and winsorized s.d. 

Univariate trimmed mean was proposed by Tukey (1963) as a robust estimate of process 
average [27]. Let x\,x 2 , x n be a sample of size n on measurement of a particular quality 
characteristic. Then the 100a% trimmed mean is defined as 

En—t 
, _ j=t+l X i 

*" (n(l-2a)) 

where a G (0, 1); t = [not + A]. For simplification, Iglewicz and Langenberg (1986) have 
taken t to be the floor of (net + .4) as an approximation [16]. The Tukey trimmed mean 
follows asymptotically normal distribution and its standard error is defined as 

se t = s w /y/n(l - 2a) 

where s w is the 100a% Winsorized standard deviation. However the distribution of set is 
not known. By definition, this statistic is robust. 
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The multivariate definition of trimmed mean depends on the choice of an appropriate 
depth function. 

y £-~<depth(xi)>cutvalue 1 

1 (#(xi, (depth(xi) > outvalue)) 

where depth(xi) is the value of the chosen depth function for the i th vector observa- 
tion Xi and outvalue is the minimum value of depth that would be accepted in data 
trimming. We further define Winsorized variance for multivariate observations to be 
S w — Disp((yi, 7/2, ■ ■ • , Z/n))j where Disp is the estimated dispersion matrix computed 
from the y,'s where jji = Xi if depth(xi) > outvalue and jji = x* otherwise where, x\ is 
the observation x^ with minimum depth above outvalue. 

The four types of depth functions used for trimming by us are, 

1. Spatial depth or L-l data depth (Chaudhuri, P. (1996)) 

2. Tukey depth (Tukey, J. W. (1975)) 

3. Liu depth or Simplicial depth (Liu, R. Y. (1988)) 

4. Oja depth (Oja, H. (1983)) 

the reasons being that they have desirable properties like affine invariance, maximality of 
center, monotonicity w.r.t. the deepest point and vanishing at infinity. 

3 Theoretical Development 

First, we present a theoretical background corresponding to the various control charts that 
form the basis of our study. 

3.1 Modification of the Shewhart chart 

In Shewhart control chart (X — S), the distribution of X and S under the normality 
assumption was used in order to determine the central line and the control limits of the 
chart. If the data comes from a normal distribution, the mean of the data will also follow 
normal distribution and the probability that the mean of the subgroups of size n go beyond 
the ±3cr /y/n limit is as small as 0.0023. We use X ± ZS/y/n as the control limits in the 
X control chart, the standard error of the subgroup means being an unbiased estimator 
of a. The distribution of S under normality of subgroup means was used for constructing 
the S-chart. We propose the use of trimmed mean X t and its standard error, St in 
developing the charts. The control limits of the charts will follow from the distributional 
features of these two quantities. If the observations^) are from N(fi,a 2 ) then one can 
take the transformation Z = (x — \i)j<7 ~ N(0, 1). Thus standard normal distribution is 
taken as the reference distribution for our discussion. It has been observed by us from 
simulation studies that trimmed mean(X t ) is asymptotically normal (Figure lh and S t 
asymptotically follows a Gamma distribution whose parameters can be estimated from the 
data (Figure [2j. 
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3.2 Modification of the EWMA chart 



The usual EWMA control chart building mechanism is as follows. Let Xi be the i th 
subgroup mean, then we define an exponentially weighted average for each i from 1 to 
n, Zi = XXi + (1 — \)Zi_i, A being a constant where Zq = the process target value, 
which we consider to be X, mean of subgroup means, in our study. It is known that 
for large number of subgroups, the in-control ucl and lei may be chosen to be /iq ± 
Lay/ (A)/(2 — A) for suitable choice of L and A respectively with the central line being /j, . 
For our proposed analogue, we replace the subgroup means by 100a% trimmed subgroup 
mean. The mechanism is same as in the case of Shewhart chart, though here we do not 
use any distributional features of X t and S t for forming the control chart, instead we just 
plug in X t and S t as robust estimators of fi an d a in the usual control limits. The choice 
of L and A though are subjective as in the usual case. 

3.3 Modification in Hotelling's T 2 chart 

Assume that we have a process at hand that generates multivariate observations, say p- 
variate observations X =(Xi, X2, ■ ■ ■ , X p ) and we need to ensure that the process stays 
in control over time. One way to do it is to treat variables separately and construct 
control charts for each of them. But that approach besides being laborious and time 
consuming, neglects the correlation among the variables. Hotelling's T 2 control charts 
are the most popular multivariate control charts in general. The Hotclling T 2 statistic, 
(X — j u) T £ _1 (X — 11) follows x 2 p under normality assumption, where p is the number of 
variables, for a sample of size n. Once we estimate £ by S, the unbiased estimator then 
(n-p)/p(X-fi) T S- 1 (X-fi)~F P!n _ p . 

We suggest a modification to the ordinary Hotclling T 2 statistic in order to make it robust. 
The new test statistic proposed is 

r 2 = (X t -^ T S w - 1 (X t - f i). 

where X t and S w are as defined in Sec:2. It is very difficult to determine the distribution 
of t 2 accurately since it depends on various factors such as the subgroup size(fc), choice 
of the depth function {depth), outvalue of trimming etc. So we had to resort Bootstrap 
techniques. 

3.4 Modifications in MEWMA chart 

The univariate definition of EWMA chart easily extends to the multivariate case. For 
p-variate observations, define p-variate vector Zi = AXi + (1 — A)Zj_i, where A denotes 
a p x p matrix , X^ "i"th subgroup mean vector and Z = X. Assume A to be a scalar 
matrix with diagonal element A for simplicity of calculations and T,z — ^x 2 ^' wnere ^ 
is the variance covariancc matrix of subgroup means. We know that (Zi — /j, z ) T Y,~ 1 (Zi — 
fi z ), for mean [i z and variance-covariance matrix Yj z follows x 2 p distribution under the 
normality assumption of Z[s. The quantity to be plotted in the MEWMA control chart 
is (Zi — il z ) T T 1 ^ 1 (Zi — /i z ) for each i , where \l z and S z are the sample counterparts of the 
mean and variance of the Z values respectively. We, however propose to plot the statistic 
Z it = XX it + (1 — A)Zi_i t for "i" th sample value of Z, where X it is the trimmed mean 
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due to the "i"th subgroup. We use 

V = {z if - \i z ) T s zw ~ x {z lt - (i z ) 

as our test statistic for the ith sample value of Z, where S zw is the sample winsorized 
variance-covariance matrix of the Z values. But again it is difficult to find the distribution 
of our proposed statistic and we have to resort to Bootstrap techniques in order to construct 
the control chart. 



4 Comparative simulation study 

4.1 Adopted rules of Insertion of Outliers 

Trimmed mean control chart is expected to give better result than the mean and variance 
based control chart when the process is in control but the data contains some outliers. In 
order to test the comparative performance of our chart, we need to insert certain outliers 
to our simulated data. We adopt the outlier insertion rules due to Gupta and Sengupta 
[10] in this regard. 

• Each subgroup contains a fixed number of outliers. 

• Each outlier is actually a random number from a largely shifted distribution. 

• The place where the outlier is to put is done by randomly choosing the point. 

• The sign of the outlier is taken +/- with probability half to each of them. 

In our studies, for 0.25 shift and for 10% trimming we took one outlier and for 20%trimming 
we inserted two outliers in each subgroup. 

4.2 Univariate Case 

We carried out a number of simulation studies in order to test for the performance of the 
chart proposed by us compared to the standard mechanisms. We considered our reference 
distribution to be standard normal, because given a normal density with specified mean 
and variance, we can use the Z-transformation to bring it to standard normal and apply the 
same set of procedures as discussed above. In Phase I, we simulated 80 rational subgroups 
of fixed subgroup size from the standard normal distribution ( in control distribution) and 
used these to construct the control limits for both Shewhart's X — S and our proposed X t - 
S t charts. In Phase II, 10,000 subgroups of observations are generated from the reference 
distribution, subgroup means are plotted and the ARL is recorded. The subgroup sizes for 
the univariate Shewhart chart have been taken to be 10, 11, 12, 14, 15 and 20, because such 
sample sizes are more used in practice in industrial processes. For EWMA chart, we have 
taken subgroups of size 20 and fixed our L at a convenient level of 3 (standard choice). 
We have taken A to be 0.20,0.25 and 0.40, which are the preferred choices (Crowder 
and Stephen [9], Lucas et al. [22], Hunter [15]) and L to be 3. We have observed the 
performance of the EWMA chart corresponding to three values of A namely 0.4,0.25 and 
0.2. The trimming percentage is usually taken to be 10% or 20% for each subgroup. The 
ucl and Id have been taken to be 95 th and 5 th percentile points for the proposed charts. 
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The control charts presented at the back of our re port (Table [j} and (Table ^ give 
a comparison of our proposed chart with the Shewhart chart and EWMA chart in 3 
scenarios-no shift, under small mean shift and in the presence of an outlier. 



4.3 Multivariate case 
4.3.1 Hotelling's T 2 chart 

For multivariate case ,we do not standardize the data because of the lack of significance of 
it. We take the subgroup size to be 20. For trimming the data ,we have used our previously 
mentioned four depth functions and compared the control chart performance under each 
case. But the data trimming will obviously depend on the choice of the outvalue used 

for trimming. We simulate data from a bivariate normal distribution with u= and 

variance covariance matrix £= (q^ ^ ^ in 1000 subgroups of size 20 each. To get an 

optimal cut-off value we select a large sample say of 100 in control subgroups from the 
process and find the depths of all the points in each subgroup. For MEWMA chart, choice 
of L and A are same as in univariate EWMA chart. Since the distribution of the test 
statistic of interest is not known, we use bootstrapping. 

We first drew 100 subgroups of observations of same subgroup size when the process is 
in control. We selected 1000 subgroups of same size with replacement from these 100 
subgroups and computed trimmed mean (Xi t ) and winsorized dispersion matrix Sj w for 
ith subgroup for a given the depth function. We computed the grand trimmed mean and 
mean of the subgroup winsorized dispersion matrices, 

Eiooo y 



1000 

Eiooo e 
1=1 3 i 



1000 

For each subgroup we computed r 2 for each subgroup and calculated its empirical distri- 
bution. 

For MEWMA chart, a similar procedure of_resampling is adopted. Here, we compute 
Zi t = XXi t + (1 — \)Zi-i for each i, taking X t to be Zq. Then we compute Z tl the grand 
mean of all the Zi t values. We estimate the dispersion matrix of the Z variable 

Sz = 

1000 ip 2 values are obtained and empirical distribution is computed. For distribution of 
both t 2 and ip 2 , we choose 90th percentile point as ucl and as Id as the process is 
out of control only when these statistics are significantly high. We have compared our 
recommended multivariate control chart under various depth functions with the standard 
charts under no shift, small mean shifts and in presence of outliers. (Table [3] and Table 

Next, we have compared our approach with the two approaches due to Liu [20] based 
on ranks and another approach based on MCD estimators by Chenouri et al [28]. We 
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considered a samples of size 20 from N(0, 1) and tested for the comparative efficacy of the 
individual algorithms under mean shifts and presence of outliers. The results are reported 
in Table [5] Liu's method performed exceptionally well in case of mean shifts but was also 
very responsive to the presence of outliers, while MCD method, despite being well adapted 
to the presence of outliers, had poor performance under small mean shifts. Our method 
on the other hand had good performance both under mean shifts and presence of outliers. 
So, the strength of our algorithm as an alternative to standard charts is quite obvious. 

5 Discussions 

• For a process in control or for small mean shifts, there is not much to choose between 
the mean based control chart and our proposed control chart. However, as expected, 
our recommended chart performs way better than the Shewhart and EWMA charts 
and their corresponding multivariate analogs, that too for any choice of depth 
function in multivariate case. 

• We have preferred subgroup size around 20, because very small subgroup sizes often 
lead to unusual fluctuations in the arl values. Some depth functions, like Tukey's 
depth are not at all reliable for small sample sizes and may cause excessive data loss 
on trimming. 

• It has been observed that bad choice of cut-off value often leads to highly fluctuating 
ucl and lei values and the control chart no longer stays very reliable. That is why 
we have considered estimation of the outvalue in the multivariate case to serve this 
purpose, we have considered the distribution of the depth function corresponding to 
all the points and taken a quantile corresponding to the distribution as outvalue so 
that neither is there a huge data loss nor a complete retainment of data in most cases. 

• Some depth functions have limitations of application. Tukey's depth can only assume 
a limited range of values for subgroup sizes like 10 or 20. So, in such cases this depth 
is not at all reliable. Liu depth can be used for bivariate data only and cannot be 
extended to higher 

dimensions . Spatial depth and the Oja depth have been very consistent in their 
performances under various scenarios viz-change in subgroup size, change in A val- 
ues, dimensionality etc as evident from Table [2] and Table |3j so these are more 
preferable under general circumstances compared to Liu and Tukey depth. 

• From bootstrapping up to the process surveillance stage, subgroup size should not 
be changed a lot. That's because the ucl and lei are obtained from an empirical 
distribution for a given sample size. So, with sample size, it is expected to alter as 
well. But it has been seen up to k±2 not much deviation in arl values ins observed. 

• For EWMA control chart we observe that the choice of A and L values play a 
significant role. Though standard L may be taken to be 3, but it is very difficult to 
choose an optimal A uniformly for all choices of depth functions. However we have 
observed that A in the range of 0.20-0.40 gives better ARL performance compared 
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to others. 



• We were not able to get good fits for the tp 2 and r 2 statistics data in most cases, with 
any standard distributions over the entire support. The gamma distribution fits the 
data well for except for the high end values, which leads to lack of fit. Due to the lack 
of any standard distribution fit, we had to resort to Bootstrapping. But in practical 
scenarios, one may still use the gamma distribution in finding the cut-off as the 
Bootstrap is computationally difficult. It will not be a very bad approximation and 
will save time. We present the gamma distribution fits to the empirical distribution 
of t 2 and ip 2 to assert this point. 



References 

[Shewhart(1931)] Shewhart, Walter A. (1931). Economic Control of Quality of Manufac- 
tured Product, New York: D. Van Nostrand Company, Inc. 

[Azzalini(2005)] Azzalini, A. (2005). The skew-normal distribution and related multivari- 
ate families. Scandinavian Journal of Statistics, 32, 159-188. 

[Hotelling(1931)] Hotclling H (1931) The Generalization of Student's Ratio Annals of 
Mathematical Statistics 

[Alfaro(2008)] Alfaro, Jose Luis and Ortega, Juan Fco. (2008). A Robust Alternative to 
Hotellings T 2 Control Chart Using Trimmed Estimators. Quality Reliability Engineering 
International, 24, 601-611. 

[Baguio(2008)] Baguio, C.B. (2008). Trimmed Mean as an Adaptive Robust Estimator of 
a Location Parameter for Weibull Distribution. World Academy of Science, Engineering 
and Technology, 681-686. 

[Barnett(1976)] Barnett, V.(1976). The Ordering of Multivariate Data. Journal of the 
Royal Statistical Society. Series A (General), 139, 318-355. 

[Chambers(1983)] Chambers, J. M., Cleveland, W. S., Kleiner, B. M., and Tukey Paul 
A. (1983). Graphical Methods for Data Analysis, Chapman and Hall, New York, 1983. 

[Chaudhuri(1996)] Chaudhuri, P.(1996). On a Geometric Notion of Quantilcs for 
Multivariate Data. Journal of the American Statistical Association, 91, 862-872. 

[Crowder (1987)] Crowder and Stephen, V.(1987). A Simple Method for Studying Run- 
Length Distributions of Exponentially Weighted Moving Average Charts. Technomet- 
rics, 29, 401-407. 

[Gupta (2008)] Gupta, A. and Sengupta, S.(2008). Online Control Charts for Process 
Averages Based on Repeated Median Filters. Communications in Statistics - Simulation 
and Computation, 37, 178 - 202. 

[Hampel(2011)] Hampcl, F.(2001). Robust statistics: A brief introduction and overview, 
Invited talk in the Symposium Robust Statistics and Fuzzy Techniques in Geodesy and 
GIS held in ETH Zurich, March 12-16, 2001. 



8 



[Hubcr(1972)] Huber, P.J.(1972). The 1972 Wald Lecture: Robust Statistics: A Review. 
Annals of Mathematical Statistics, 43, 1041-1067. 

[Huber(1980)] Huber, P.J. (1980). Robust Statistical Procedure, 2nd edition, CBMS-NSF 
REGIONAL CONFERENCE SERIES IN APPLIED MATHEMATICS, Issue 68. 

[Huber(2002)] Huber, P.J. (2002). John Tukey's Contributions to Robust Statistics. The 
Annals of Statistics, 30, 1640-1648. 

[Hunter(1986)] Hunter, J. S. (1986). The exponentially weighted moving average, Journal 
of Quality Technology, 18, 203-210. 

[Igle(1986)] Iglewicz, B. and Langenberg, P.(1986). Trimmed mean X and R charts. 
Journal of Quality Technology, 18, 152-161. 

[Kochanski(2005)] Kochanski, G. (2005). Brute Force as Statistical Tool. 

[Liu (1988)] Liu, R. Y. (1988). On a notion of simplicial depth, Proceedings of the National 
Academy of Science USA, 85, 1732-1734. 

[Liu(2006)] Liu, R. Y., Serfhng, Robert J. and Souvaine, Diane L.(2006). Data depth: 
robust multivariate analysis, computational geometry, and Applications. DIMACS 
Series in Discrete Mathematics and Theoretical Computer Science, 72, American 
Mathematical Society. 

[Liu (1995)] Liu RY. (1995) Control charts for Multivariate Processes Journal of 
American Statistical Association, Vol. 90 No. 432, 1380-1387 

[Lowry(92)] Lowry, Cynthia A., Woodall, William H., Champ, Charles W.and Rigdon 
Steven E.(1992). Multivariate Exponentially Weighted Moving Average Control Chart, 
Technometrics, 34, 46-53. 

[Lucas(1990)] Lucas, James, M. and Saccucci, Michael S.(1990). Exponentially weighted 
moving average control schemes: properties and enhancements. Technometrics, 32, 
1-29. 

[Mont(1996)] Montgomery, D.C.(2008). Introduction to Statistical Quality Control, 6th 
Edition, WILEY. 

[Oja(1983)] Oja, H. (1983). Descriptive statistics for multivariate distributions. Statistics 
& Probability Letters, 1, 327-332. 

[Roberts(1959)] Roberts, S.W. (1959). Control Chart Tests Based on Geometric Moving 
Averages. Technometrics, 1, 239-250. 

[Stigler(1931)] Stigler, S. M.(1973). The Asymptotic Distribution of Trimmed Mean, The 
Annals of Statistics, 472-477. 

[Tukey(1963)] Tukey, John W. and McLaughin, Donald H.(1963). Less vulner- 
able confidence and significance procedures for local based on a single sam- 
ple (trimming / winsorizing I). Sankhya, 331-352. 

[Chenouri(2007)] Chenouri, Shoja'eddin, Variyath, Asokan M. and Steiner, Stefan 
H. (2007). A Multivariate Robust Control Chart for Individual Observations. 
http: // sas. uwaterloo . ca/ stats_ navigation/ techreports/ 07WorkingPapers/ 
\2007-0l7pif\ 



9 



-i r - 



-i r - 



-i r - 



0.6 



0.4 

0.2 


-0.2 
-0.4 
-0.6 



■Trimmed mean datapoints 
■ Normal distribution 




_i L_ 



_i L_ 



_i L_ 



0:i 0.2 0.3 0.4 0.5 0.6 0.7 0.8 

Probability 



0.9 



Figure 1: QQ plot of trimmed mean vs normal 
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Figure 2: QQ plot of trimmed mean variance vs gamma 
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Method 


Our method 


Liu's method 


MCD based method 




ARL 


sd(ARL) 


ARL 


sd(ARL) 


ARL 


sd(ARL) 


Case 














No shift 


199.34 


91 


159.3 


32.36 


126.4 


43.76 


.25 Shift 


24.9 


6.3 


13.4 


2.6 


60.04 


34.4 


0.5 Shift 


11.02 


1.28 


9.88 


1.44 


29.223 


10.41 


1 Shift 


2.02 


0.074 


1.2 


0.034 


11.52 


1.927 


1 Outlier from iV(3,3) 










104.78 


35.12 16.68 


4.27 


94.33 


22.62 


2 Outliers from JV(3, 3) 










94.66 


28.34 19.27 


5.48 


78.44 


18.16 



Table 5: Comparative ARL performance of our method with Liu's method and 
Chenouri et al's method 
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